Application of Refined LSA and MD5 Algorithms in Spam Filtering
نویسندگان
چکیده
The paper proposes a spam filtering method that uses integrated and refined Latent Semantic Analysis (LSA) and Message-Digest Algorithm 5 (MD5) algorithms to address a series of universal problems in spam filtering, including remarkably lowered filtering precision and notably unbalanced filtering efficiency as a result of lack of latent semantic analysis of mail contents. In introducing LSA, its weighting function is improved by integrating fuzzy membership to improve effectiveness of LSA in processing mail contents. On top of this, MD5 algorithm is used to generate “E-mail fingerprint”, thus enabling quick matching and realizing highly efficient and accurate processing of mass-mailing spam. The result of the simulation experiment testifies effectiveness of the method.
منابع مشابه
Non-Parametric Spam Filtering based on kNN and LSA
The paper proposes a non-parametric approach to filtering of unsolicited commercial e-mail messages, also known as spam. The email messages text is represented as an LSA vector, which is then fed into a kNN classifier. The method shows a high accuracy on a collection of recent personal email messages. Tests on the standard LINGSPAM collection achieve an accuracy of over 99.65%, which is an impr...
متن کاملContent-Based Spam Filtering on Video Sharing Social Networks
In this work we are concerned with the detection of spam in video sharing social networks. Specifically, we investigate how much visual content-based analysis can aid in detecting spam in videos. This is a very challenging task, because of the high-level semantic concepts involved; of the assorted nature of social networks, preventing the use of constrained a priori information; and, what is pa...
متن کاملFiltering Image Spam with Near-Duplicate Detection
A new trend in email spam is the emergence of image spam. Although current anti-spam technologies are quite successful in filtering text-based spam emails, the new image spams are substantially more difficult to detect, as they employ a variety of image creation and randomization algorithms. Spam image creation algorithms are designed to defeat well-known vision algorithms such as optical chara...
متن کاملA Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملA Novel Method for Detecting Spam Email using KNN Classification with Spearman Correlation as Distance Measure
E-mail is the most prevalent methods for correspondence because of its availability, quick message exchange and low sending cost. Spam mail appears as a serious issue influencing this application today's internet. Spam may contain suspicious URL’s, or may ask for financial information as money exchange information or credit card details. Here comes the scope of filtering spam from legitimate em...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCP
دوره 4 شماره
صفحات -
تاریخ انتشار 2009